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We describe an algorithm for the sequential sampling of entries 
in multiway contingency tables with given constraints. The algorithm 
can be used for computations in exact conditional inference. To jus- 
tify the algorithm, a theory relates sampling values at each step to 
properties of the associated toric ideal using computational commu- 
tative algebra. In particular, the property of interval cell counts at 
each step is related to exponents on lead indeterminates of a lexico- 
graphic Grobner basis. Also, the approximation of integer program- 
ming by linear programming for sampling is related to initial terms 
of a toric ideal. We apply the algorithm to examples of contingency 
tables which appear in the social and medical sciences. The numer- 
ical results demonstrate that the theory is applicable and that the 
algorithm performs well. 

1. Introduction. Sampling from multiway contingency tables with given 
constraints can be used to compute exact Monte Carlo p- values of goodness- 
of-fit and parameter significance for conditional inference. This is desirable 
when the tables of interest are numerous but have entries that raise doubts 
about the validity of asymptotic methods. A classical application is testing 
for Hardy- Weinberg equilibrium with multiple alleles, where some alleles 
may be quite rare and result in sparse tables [20]. Other applications are 
described in [2, 6, 13]. A more general problem is sampling from nonnegative 
integer lattice points. This includes contingency tables, and further appli- 
cations such as Monte Carlo EM algorithms with incomplete data [31] and 
Bayesian computation of posterior distributions [30]. 
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Markov chain Monte Carlo (MCMC) has been a popular technique for 
generating random samples from tables with given constraints. It is usually 
easy to program, does not require a lot of memory, and has wide applica- 
bility. Diaconis and Sturmfels [14] gave algebraic characterizations of the 
moves necessary to run such a Markov chain. However, for some loglinear 
models the constraints from sufficient statistics on multiway tables make it 
difficult to design irreducible Markov chains. Diaconis and Sturmfels [14] 
gave a method to produce Markov moves that connect all tables with given 
constraints, but in some practical cases, such as large logistic regression 
examples, the moves cannot be computed. It is sometimes possible to do 
computations with a smaller collection of moves by letting some entries in 
the space of tables go negative. This idea is used in [4, 7]. The cost is a 
longer running time for the Markov chain. In general, the running times of 
these Markov chains are very difficult to judge. Therefore, Markov chains 
have three disadvantages: (1) they can be hard to design, (2) they can take 
a long time to run to stationarity, and (3) the time to run to stationarity 
may not be clear. 

Sequential importance sampling (SIS) avoids these disadvantages of Markov 
chains because it is relatively easy to implement and there is no issue of con- 
verging to a stationary distribution. Chen et al. [6] introduced an SIS pro- 
cedure for simulating two-way zero-one and contingency tables with fixed 
marginal sums, which compares favorably with other existing Monte Carlo- 
based algorithms. Similar techniques have also been applied to a logistic 
regression problem in [7]. This paper shows that SIS can be implemented 
efficiently for many multiway contingency table problems that have been 
studied mostly with Markov chains. 

The idea behind SIS is to sample cell entries in the contingency table 
one after the other so that the final joint distribution (i.e., the proposal dis- 
tribution) is close to the target distribution. SIS does not have the same 
disadvantages as a Markov chain, because the method terminates at the last 
cell and generates i.i.d. samples from the proposal distribution. However, SIS 
raises a new set of implementation issues. The main problems are approx- 
imating the support of the marginal distribution of each cell quickly, and 
then approximating the marginal distribution on the support set with a pro- 
posal distribution. We show how properties of the sampling set at each step 
can be deduced from algebraic conditions on a collection of Markov moves. 
The results of this paper extend the applicability of SIS from two-way ta- 
bles [6] to a wider range of multiway tables and allow further comparison 
with Markov chain methods. 

The target distribution on the collection of tables may be hypergeomet- 
ric, which arises in conditional inference with multinomial sampling, or it 
may be another related distribution such as the one for Hardy- Weinberg 
proportions. SIS can yield an approximate count of constrained tables very 
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quickly when the target distribution is uniform. This application has been 
carried out in [6], where SIS was shown to be more efficient than Markov 
chains for counting and testing two-way tables. Combinatorists are inter- 
ested in counting tables with given constraints [11]. Counting tables is also 
related to conditional volume tests [13]. In our multiway examples, we found 
approximate counts of tables without difficulty The exact counting software 
LattE [11] confirmed the counts on the two smaller examples. The uniform 
target distribution is also useful for Bayesian applications where a uniform 
prior on probabilities leads to equally likely tables, and for the conditional 
volume test [13]. 

The paper is organized as follows. In Section 2 we introduce essential 
ideas of SIS. The algebraic conditions for efficient sampling are formulated 
in Sections 3 and 4. Many of the algebraic ideas of Markov chains on lattice 
points are used. Section 3 treats the basic case where properties of polynomi- 
als generating the toric ideal are related to SIS. Section 4 is more technical 
and develops stronger methods for subsets of the Markov basis. These re- 
sults can apply when the observed margins imply conditions of positivity on 
the tables constrained by the margin values. 

Section 5 is about the relationship between linear programming (LP) and 
integer programming (IP). When the support of the marginal cell distribu- 
tion is an interval of integers [l,u], a situation established under conditions 
in Sections 3 and 4 and which occurs often in practice, one needs the values 
of the upper and lower bounds. Knowing then that LP and IP give nearly the 
same answer is important, because using an IP algorithm at each step in the 
procedure would be much slower than using LP. A precise algebraic relation- 
ship between LP and IP is developed in [22], which gives an algorithm for 
finding the maximum difference between the two over all conceivable data 
sets. The results here may be easier to apply in some examples. In practice 
it is not essential that LP and IP be identical. Section 6 discusses sampling 
distributions for different target distributions. In Section 7 we give a range of 
examples to show how well SIS can work in real problems. Section 8 provides 
concluding remarks. 

2. Elements of SIS. Let denote the set of all contingency tables with 
given constraints. Assume f2 is nonempty. The p-value for conditional infer- 
ence on contingency tables can often be written as 



where p(n) is the underlying distribution on f2, which is usually uniform or 
hypergeometric and only known up to a normalizing constant, and /(n) is 
a function of the test statistic. For example, if we let 





(2) 
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where no is the observed table, formula (1) gives the p-value of the exact 
test [20]. In many cases sampling from p(n) directly is difficult. The im- 
portance sampling approach is to simulate a table n € £1 from a different 
distribution q(-), where q(n) > for all ne!l, and estimate /i by 

(3) t = Sili /(nf)p(ni)/g(n«) 

EliP(ni)/q(rH) ' 

where ni, . . . , n^r are i.i.d. samples from q(n). We can also estimate the total 
number of tables in f2 by 

i N n 

because \Q\ = J2nen ^Tjli 11 )- The underlying distribution on U correspond- 
ing to this case is uniform. 

In order to evaluate the efficiency of an importance sampling algorithm, 
we can look at the number of i.i.d. samples from the target distribution that 
are needed to give the same standard error for jl as N importance samples. 
A rough approximation for this number is the effective sample size [24] 

N 

(5) ESS = 7— "2. 

1 + cv z 

where the coefficient of variation (cv) is defined as 

2 _ vav q {p(n)/q(n)} 



(6) cv 2 



£2{p(n)Mn)} 



Accurate estimation generally requires a low cv 2 , that is, <7(n) must be suf- 
ficiently close to p(n). We will use cv 2 as a measure of efficiency for an 
importance sampling scheme. In practice, the theoretical value of cv 2 is un- 
known, so its sample counterpart is used to estimate cv 2 . The standard error 
of fi or |0| can be simply estimated by further repeated sampling [6]. 

SIS as it applies to multiway tables fills in the entries of a table cell by 
cell, in a way that guarantees that every table in can be produced. More 
precisely, we stack all entries of the table into a long vector n, and start by 
sampling the first cell count n\ of the vector n with a proposal distribution 
q(ni). Conditional on the realization of the first cell, we sample the second 
cell count with a proposal distribution q[n^n\), and then move forward 
sequentially until all the cells are sampled. Denoting the cell counts of n by 
nx, . . . , rid, we can write the joint proposal distribution q as 

q((n 1} . ..,n d )) = q(ni)q(n 2 \ni)q(ns\n 2 , n{) ■ ■ -q(n d \n d _i, . . . ,m). 

Ideally, one would like to sample a cell value from the marginal distribution 
of a cell entry, conditional on the entries that have already been sampled. 
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However, these marginal distributions are quite difficult to compute explic- 
itly except in very small examples. SIS then raises some problems if it is 
to be used effectively: (1) When and how can the support of the marginal 
distribution nj|(rij_i, . . . ,n\) be quickly determined or approximated? (2) 
How can the support of the marginal distribution be sampled with a pro- 
posal distribution q that is close to the true underlying distribution pi We 
address these questions in the following sections of the paper. 

3. Sequential intervals and algebra. When they apply SIS to the prob- 
lem of sampling two-way contingency tables with fixed marginal sums, Chen 
et al. [6] notice that the support of the marginal distribution m\(ni-i, ... ,n\) 
is an interval of integers [here n = (n\, . . . , n^) is the table in a vector for- 
mat]. Therefore, they can sample a value from the interval at each step and 
always produce a table in Q, that is, every table satisfies the constraints. 
This saves a lot of computing time compared to rejection sampling. Another 
advantage of having this interval property is that one can find a good pro- 
posal distribution g(nj|nj_i, . . . , m) more easily than in the situation where 
there are gaps in the support set. 

SIS tends to perform better when the sequential interval property holds, 
but for general constraints on multiway tables, it is not always true that 
one can fill in entries in sequence and expect the range of feasible values to 
be an interval of integers. Examples where the sequential interval property 
does not hold are very sparse logistic regression [7] , many 3- way tables with 
certain margin constraints (see [12] for the full range of difficulties with 
3-way tables) and some triangular tables of genotype data when cells are 
sampled in certain orders. Typically, there may be a problem if the moves of 
a Markov basis involve changes in some entry that are of size ±2 or larger. 
A precise condition is more complicated and weaker than "no moves of size 
greater than 1," and may depend on the margin values and the order of the 
sequential sampling. In this section we give the basic theorems that are not 
related to the actual values of the margin constraints. In the next section 
we strengthen the results. 

Now we introduce notation for lattice points and the algebra of polyno- 
mials that will be used in our study of SIS. Let A be an r x d matrix of 
nonnegative integers, denoted Z + . In applications d is the number of cells 
in the table, and r is the number of parameters (not necessarily free) in an 
exponential family model. A is often referred to as the constraint matrix 
and r is the total number of constraints. We assume that a sum of some 
nonempty subset of the rows of A is a strictly positive vector. In applica- 
tions with multinomial sampling, this will be immediate because the sample 
size is fixed, so the constant vector of ones is a row or is in the row space of 
A. For teZl, let 

A -1 [t] :={nGZf:in = t). 
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This is a collection of tables with linear constraints, that is, the set of non- 
negative integer points inside a polytope. The linear constraint value t will 
sometimes informally be called a margin constraint. The value of t will typ- 
ically be the sufficient statistics for a loglinear model. Our primary goal is 
to sample from .A _1 [t]. 

Let us first recall the notion of a Markov move on ^4 _1 [t]. If m € ker^(A) 
(the null space of A in the integers), then m is a Markov move. With a 
collection of such moves, one can define a symmetric Markov chain on A _1 [t] 
by starting at an initial state n £ -A -1 [t], and then uniformly choosing one 
of the moves m and a sign on the move, and then moving to the new state 
n±m if this new vector is nonnegative (i.e., every entry is nonnegative) . A 
Markov basis Ma for A is a subset of ker^(^4) such that, for each pair of 
vectors u, v € Z+ with Au = Av, there is a sequence of vectors rtij £ Ma, 
i = 1, . . . , I, such that 

l 

u = v + ^mi, 

i=i 

3 

0<v + ^mj, j = l,...,l. 

i=l 

That is, two nonnegative vectors with the same linear constraints can be 
connected with a sequence of increments from Ma while always maintaining 
the linear constraints and the nonnegativity. 

Define the polynomial ring Q[x\, . . . , Xd] in indeterminates (polynomial 
variables) x\, . . . ,Xd, one for each cell. Define the toric ideal 

I A :={x n -x m :An = Am), 

where the usual monomial notation for a nonnegative 

integer vector of exponents n = (m, . . . , rid). The way to go between Markov 
moves and polynomials is simple: order and number the cells in the table, 
create an indeterminate (polynomial variable) for each cell in the table, and 
put the positive Markov move cell values on one monomial, put the negative 
values on another monomial, then form the difference. For example, the 
Markov move (1,-1,-1,1)' can be denoted as X1X4 — £2X3. The choice of 
cell ordering can be important, as in Example 7.5. 

There are two fundamental algebraic ideas related to Markov bases. For 
m £ Z d , define m + = max{0, m}, m~ = max{0, — m}, so m = m + — m~. 
The first fundamental result, shown by Diaconis and Sturmfels ([14], Theo- 
rem 3.1), is that a finite generating set of binomials {x" 1 ^ — x m i , i = 1, . . . ,g} 
for I a defines Markov moves ±(m^" — m^~), i = 1, . . . ,g, that are a Markov 
basis in that they connect all of A _1 [t] when chosen randomly as vector in- 
crements, regardless of the actual value of t. In other words, a Markov basis 
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always exists independently of the actual values of the linear constraints. 
The second fundamental result ([29], Theorem 8.14) is that a collection of 
moves will connect two tables n and m if x n — x m E /, where I is the ideal 
generated by the collection of moves. This is used to show connectivity for 
subcollections of the full Markov basis for particular values of t in Section 4. 

Definition 3.1. Define the projection operator tt\ : Z d —> Z by 7Ti {z\ , . . . , 

Zd) = zi. 

Lemma 3.1. Suppose a Markov basis Ma satisfies tti{Ma) C {—1,0, +1}. 
Then 7Tx(A _1 [t]) is an interval of integers \Li,u\\. 

Proof. One can connect tables m, n G ^4 _1 [t] with values mi and m in 
the first coordinate by changing the first coordinate only ±1 at each step, 
so the gap between possible values cannot be greater than 1. □ 

If the columns of A are ai, . . . , a^, let A{ = (aj, a^+x, . . . , a^) be the matrix 
that deletes the first i — 1 columns and keeps the last d — i + 1 columns of 
A. 

Definition 3.2. The polytope ^4 _1 [t] has the sequential interval prop- 
erty if 7Ti(74 _1 [t]) is an interval of integers [Zx,tti], and for i = 1, . . . ,d — 1: 
if m G wi(A^ 1 [t - nxax - • • • - n^a^x]), then -Ki{A~^ x \t - ni&i - ■■■ - 
nj_iai_x — n i a i]) is also an interval of integers [/j+x, itj+x]- 

The next result is the most basic connection between the sequential in- 
terval property and the exponents of a lex basis for the toric ideal. An 
important point is that the condition does not require that all exponents 
in the Markov basis have magnitude at most 1. Rather, it requires that the 
exponent be at most 1 on the indeterminate X{ (square-free in x.;) on the 
moves that involve only the present and future cells i, i + 1, . . . , d in the lex 
basis. This point is important for many examples, including 3x3x3 tables 
with no-3-way interaction (Example 7.4). 

With a particular cell order, the indeterminates are typically ordered x± > 
x 2 > • • ■ > Xd, and then one can introduce term orders. We primarily use the 
lexicographic term order (lex order), which totally orders monomials (or, 
equivalently, their vector exponents corresponding to tables) by declaring 
x n > x m if and only if the first entry from the left in n — m is positive (or 
n is after m in the dictionary sense). Cox, Little and O'Shea ([10], page 52) 
explain term orders, including the gi'evlex order that we use in Section 5 
where the indeterminates are taken in reverse order Xd > x^-i > ■ ■ • > x\. 
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In the following, we use the term "Grobner basis," which is a special gen- 
erating set for an ideal ([10], page 74). Lex Grobner basis (or lex basis) will 
mean Grobner basis with respect to lexicographic term order ([10], page 54) 
and reduced Grobner basis is a unique representation ([10], page 90). 

Proposition 3.1. Suppose a Markov basis Ma = {±mi, . . . , ±m s } has 

the property that G := {x 1 ™* — x™ 1 * , i = 1, . . . ,g} is a lex Grobner basis with 
ordering x\ > X2 > • • ■ > Xd on indeterminates and suppose the elements 
of G H Q[xi, . . . , xj] are square-free in x% for each i. Then A~ 1 [t] has the 
sequential interval property for all t. 

Proof. By the elimination theorem ([10], page 113), the lex basis G 
has the property that G n Q[xi, . . . ,Xd] is a Grobner basis for the ideal 
lA t = (x m — x n , A^m = Ajn). Hence, by Theorem 3.1 of Diaconis and Sturm- 
fels [14], the difference of the exponents (together with signs ±) of elements 
in G R Q[xi, . . . , Xd] is a Markov basis with in coordinates 1, 2, . . . ,i — 1. An 
application of Lemma 3.1 to the matrix Ai with first coordinate n« completes 
the proof. □ 

When using this result, some orders on the cells may have the square- 
free property and others may not, so it can be used to find good orderings 
on the cells. The sensitivity to cell ordering shows up in many examples, 
including logistic regression and Hardy- Weinberg testing with genotype data 
(Example 7.5). 

In fact, the converse to Proposition 3.1 is also true, in the sense that 
matrices A, such that A _1 [t] has the sequential interval property regardless 
of t, are characterized by their lex Grobner bases. 

Proposition 3.2. Let A be a nonnegative integer matrix such that 
A _1 [t] has the sequential interval property for all t. Then the reduced lex 
Grobner basis G for I a with ordering x\ > X2 > • • • > Xd has GC\ Q[xi, . . . Xd] 
square-free in X\ for all i. 

Proof. It suffices to prove the claim on the first cell, the rest following 
by induction. Let G := {x m i — x m ; } be the reduced lex Grobner basis. In 
particular, none of the monomials x m ^ is divisible by the leading monomial 
of any other binomial in I a- Suppose there is some x m — x m € G with 
7Ti(m + ) = a > 1. Let t = Am + . Since A _1 [t] has the sequential interval 
property and 7Ti(m~) = 0, there exists n € A" 1 ^] with 7Ti(n) = a — 1. Then 
the binomial xj~ a+1 (x m+ — x n ) € I a is not equal to x m — x m , and has 
leading term xj~ a+1 x m+ which divides x m+ . This is a contradiction and 
x m+ — x m is not in the reduced Grobner basis G. □ 
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4. Markov subbases. In this section we give results that can be used 
when the full Markov basis does not have the required properties to guar- 
antee sequential intervals. Situations where this occurs include logistic re- 
gression [7] and Example 7.3, where the lex bases for the toric ideals do not 
have the conditions of Proposition 3.1. 

The results in this section use the particular values of the margin con- 
straints, which may allow a smaller and simpler connecting set that we call 
a Markov subbasis. An existing method to study connectivity properties of 
subsets of a Markov basis is the primary decomposition ([10], page 208). 
While useful in some examples, it is usually quite difficult to compute. The 
methods in this section use computational tools that are more easily applied 
in many cases. 

To motivate some of the ideas that follow, recall that in some contingency 
tables it is possible to easily identify a reasonable collection of Markov moves 
that preserve the required constraints and are a basis in the linear algebra 
sense for the kernel of the constraint matrix. However, a basis in the linear 
algebra sense does not always give a Markov basis — the Markov basis al- 
lows you to connect all tables while remaining nonnegative, a condition not 
guaranteed by the linear algebra basis. The smaller collection, while not a 
Markov basis, may connect tables with certain margin values while remain- 
ing nonnegative. The linear algebra basis can be enlarged to a Markov basis 
by a process called saturation discussed below, and the result can be much 
more complicated than the original collection of moves. 

A lex basis for the toric ideal I a for a constraint matrix A is quite special 
in that the Markov moves that involve cells i, i + 1, . . . , d are a lex basis for 
the toric ideal for 1^ ■ This is a consequence of the elimination theorem, and 
means that one lex basis calculation gives sequential sampling information 
about all the cells in sequence. With a collection of moves smaller than a 
lex basis for I a, the theory is more difficult. 

A Markov subbasis M^,t for t € and integer matrix A is a finite 
subset of 'kerz(A) such that, for each pair of vectors u,v € A _1 [t], there 
is a sequence of vectors rrij £ M^ t , i = 1, . . . , I, such that 

l 

u = v + ^rrij, 

i=l 

3 

0< v + ^m*, 

i=i 

The connectivity through nonnegative 
for this specific t. 



j = l,...,l. 
lattice points only is required to hold 



Lemma 4.1. Suppose a Markov subbasis M^,t satisfies TTx(M^t) C {— 1, 
0,+l}. Themr^A- 1 ^}) is an interval of integers [h,ui\. 
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Proof. One can connect tables with feasible values n\ and mi in the 
first coordinate by changing the first coordinate only ±1 at each step, so the 
gap between possible values cannot be greater than 1. □ 

The following proposition is used in Examples 7.3 and 7.4, where Propo- 
sition 3.1 cannot be used. Recall that a lex basis for a toric ideal has the 
property that each elimination ideal ([10], page 113) is also a lex basis for a 
remaining toric ideal, so applying Lemma 4.1 in sequence is immediate. With 
a subbasis, however, one must add a technical condition involving saturation 
to get the sequential application of Lemma 4.1. 

Saturation (see [28], page 113 or [25], page 215) is an algebraic procedure 
that enlarges an ideal. In our case the ideal will correspond to a collection 
of Markov moves possibly less than a full Markov basis. If I is an ideal in 
the ring Q[x\, . . . ,Xd] and / is a polynomial, then the saturation of I by / 
(denoted I : f°°) is defined by 

I:f°°:={g£Q[x 1 ,...,x d ]:f k -g£l for some k > 0}, 

which is also an ideal. For the indeterminate the collection of 

polynomials g such that x\g is in the ideal I for some choice of the exponent 
k. 

Proposition 4.1. Suppose M^t is a Markov subbasis, let M^t = {±mi, 
. . . ,±m g } and let G := {x m ^ — x m i ,i = 1, . . . ,g}. Suppose G has the follow- 
ing three properties: (1) G is a lex Grobner basis for the generated ideal Im a t 
with order x\ > X2 > ■ • ■ > Xd on indeterminates; (2) G n Q[xi, . . . ,Xd] are 
square-free in Xj for each i; and (3) (Im a t : X T) nQ^i+i, • • • , Xd] C Im a t f or 
each i = 1, 2, . . . , d — 1. Then the polytope A" 1 ^] has the sequential interval 
property. 

Proof. By Lemma 4.1, 7ri(74 _1 [t]) is an interval. We must show that 
two tables in yl" 1 ^] with a common entry in coordinate 1 can be connected 
with moves in M&,t without touching coordinate 1. To see this, suppose 
tables u', v' S ^4 _1 [t] have common first coordinate u\ = v\ = c. 

Let u = (0, U2,uz, . . . , Ud), v = (0, ^2,^3, • • • , Vd)- We must show that x u — 
x v G (G C\ Q[x2, ■ ■ ■ , Xd]) to be able to connect them with moves in G that 
only involve changing the second coordinate (by only ±1 at each step). Since 
G is a lex basis, (G n Q[x2, • • • , x^]) = Im a t H Q[%2, ■ ■ ■ ,Xd], and it is enough 
to show that x u — x v G Im a t ■ We have that x u — x v E Ia- 

Since x\ (x u - x v ) = x u ' - x v ' G I MAt , the binomial x u - x v G (hi At : xf) n 
Q[x 2 , . . . ,xa]. Under the assumption (hi At : xf) n Q[x 2 , ...,x d ]C hi Att , the 
first step is proven. 
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Suppose now that two tables u', v' € A" 1 [t] have common first two coordi- 
nates u\ = v\ = ci, U2 = V2 = C2. Let u = (0, 0, 113, u±, . . . , Ud), v = (0, 0, vs,v^, 
. . . , Vd). We must show that x u — x v G (G fl Q[x3, . . . , 2^]) to be able to 
connect them with moves in G that only involve changing the third coor- 
dinate (by only ±1 at each step). By the argument above, we have that 
x^x u — X2 2 x v E J- Ma t • Then by the saturation condition on xi, x u — x v € 
(I MA , t :xf)nQ[x 3 ,.'..,x d ]c M A ,t- 

The argument continues likewise for each cell in the order 1,2, ... ,d. □ 

To use Proposition 4.1, one must have in hand a Markov subbasis, which 
requires knowing some connectivity properties. These can be established 
sometimes with ad hoc arguments or with the primary decomposition of 
the ideal lM At - Lemma 4.2 below is a new method to verify a Markov 
subbasis, and we use it in Example 7.3. The quotient ":" operation is defined 
by I : f := {g : f ■ g £ I}, the result of one step of the saturation procedure 
defined above. 

Lemma 4.2. Let M C kerz(A) be Markov moves with ideal Im- Suppose 
each element n € A _1 [t] satisfies n s > for all s £ S C {1, . . . , d}, and sup- 
pose that (Im '-Yl s &s x s) = 1 A> the toric ideal. Then the moves in M connect 
all of A _1 [t] and are therefore a Markov subbasis. 

Proof. Let u, v e A~ x [t] , and let u' = u — 1$, v' = v — Is, where Is is 
the vector with 1 in the coordinates that are in the set S, and elsewhere. 
Clearly, x u — x v 6 I a, so by the saturation assumption (x u — x v ) IlseS x s G 
Im- The fundamental result of Diaconis and Sturmfels ([14], Theorem 3.1) 
says that the moves in M connect u = u' + Is with v = v' + Is through the 
nonnegative tables. □ 

5. Bounds on cell entries. When the conditions for sequential interval 
property are met, the next question is how to quickly determine or approxi- 
mate the upper and lower bounds of the interval [l,u]. In very special cases 
one can use known formulas for the interval, such as the Frechet bounds. 
This works in two-way tables and some decomposable graphical models [6]. 
For general multiway tables, usually no simple formula is available to com- 
pute the bounds. Three general ways to determine or approximate the upper 
and lower bounds of the interval [l,u] are integer programming (IP), linear 
programming (LP) and the shuttle algorithm. IP always gives the exact 
integer bounds I and u, but it is much slower than the other two methods. 

LP in the rational numbers can dynamically find bounds on the interval at 
each step in the sampling. LP is much faster than IP, and under conditions 
that hold in many examples, LP gives the same answer as IP. The conditions 
we formulate are concrete algebraic conditions that can be checked with 
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a preliminary calculation. Hosten and Sturmfels [22] study the difference 
between LP and IP from a different point of view. They give the largest 
possible difference over all constraint values, whereas our results use the 
particular constraint values of the data set. 

The numerical implementation of LP to determine an interval [l,u] must 
be done carefully. LP sometimes gives wider intervals than the true interval 
because LP considers solutions in a larger space. Roundoff of numerical 
approximations that come from floating point operations or interior point 
methods can result in sampling a number out of the feasible range [I, u] 
or into a strict subset of the feasible range which can lead to errors. The 
program that we embedded into the sampling code and that worked well is 
IpSolve [1]. 

A third way to approximate the intervals is the shuttle algorithm, de- 
scribed in [5] and [16]. This is an iterative method that usually does not 
give exact IP results, but it has two advantages in special cases: it is fast 
and easy to program, and it can be implemented without explicitly con- 
structing a constraint matrix, a task which may be impossible for very large 
problems with millions of cells. In our numerical examples LP works better 
than the shuttle algorithm, in some cases much better. 

Consider the IP and LP problems 

Uj(h) := m&x{rij : Ajii = b, n £ }, 

lj(b) := min{nj : Ajii = b, n € Z+}, 

Uj(b) := max{(/j : A,q = b, q € Q+}, 

Lj(b) :=min{g i :^ i q = b,q€ Q d + }, 

where Z + , Q + are the nonnegative integers and nonnegative rational num- 
bers, respectively. We are interested in bounding the nonnegative quantities 
Uj — Uj and lj — Lj . 

In Propositions 5.1 and 5.2 that follow, we use the relationship between 
lower and upper IP bounds and normal forms with respect to lex and grevlex 
term orders explained in [9] and stated in Algorithm 5.6 of [28], page 43. 
For the following proposition, let ^4g X [t] := {q G : Aq = t}, the set of 
nonnegative rational vectors with constraints t. 

Proposition 5.1. Suppose a Markov subbasis M^t = {±mi, . . . , ±m 9 } 
has the property that G := {x m » + — x m » , i = 1, . . . ,g} is a lex Grobner ba- 
sis with ordering x\ > X2 > • • • > on indeterminates for the generated 
ideal lM At - Also, suppose lM At : l~Leis Xs = ^ A > w here Sq is the collection 
of coordinates which are always positive for elements in Aq 1 [t] , and suppose 
I MAit ■ x°° n Q[x i+ i, ...,x d ]c lM A>t for each i = 1, 2, . . . , d - 1. 
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If the coordinate values of all mf (i = 1, . . . , g) are in {0, 1}, then lj(tj) = 
Lj (tj) for all j = 1,2, ... ,d and all tj given by t% = t, tj = t — aim — a2«2 — 
a j-i n j-i> 3 = 2 > • • ->d. 

Proof. We show first the result that 1\<L\. Let m 6 -A _1 [t]. 

Use long division to compute the normal form of x m with respect to 
Im a t • Let the normal form be the monomial x n . It is nearly immediate 
that n*>li, since the first coordinate of the normal form when dividing by 
a Grobner basis for the full ideal I a is l\. 

Let q* solve L\ = min{gi : Aq = t, q € Q%}- We show that q\ > n*, which 
together with n\ > l\ will prove the result L\ = l\. 

Suppose by way of contradiction that n\ > q*. Since q* is rational, an 
integer multiple, say Aq*, is integral. Then A(Xq*) = A(Xn*), so x An — 
x Aq £ I a - Furthermore, by the assumption of positivity of coordinates Sq on 
elements in Aq 1 [t], it follows that q*, n* > for s G Sq. Then x An * - x Aq * G 
J M Att by the assumption J MA?t : H se i s x s = I A . 

Since G is a Grobner basis for this ideal, one of the lead terms of the basis 
must divide the lead monomial x An . This means that the indices of positive 
coordinates of the exponents m,^ of the lead monomial must be included 
in the positive coordinates of n*. Since the corresponding coordinate values 
are or 1, the divisor must also divide n*. This contradicts its construction 
above as the normal form without divisors. Hence, it cannot be the case that 
n\ > q^. This proves that li<n\<q\ = L\. 

We show next the result that I2 < Li- Let m G J 4 _1 [t]. 

Use long division to compute the normal form of x m with respect to 
G2 ■= G fl Q[x<2, X3, . . . ,Xd], the elements of the subbasis that only involve 
coordinates 2,3, ... ,d. Let the normal form be the monomial x n , where 
n\ = mi, which has not changed in the division. It is nearly immediate 
that 112 > h, since the first coordinate of the normal form when dividing 
by a Grobner basis for the full ideal Ia 2 is h- 

Let q* solve L2 = min{(/2 : Aq = t,qi = mi, q G We show that q\ > 

n\, which together with n\ > I2 will prove the result L2 = h- 

Suppose by way of contradiction that n\ > q\. Since q* is rational, an 
integer multiple, say Aq*, is integral. Then A(Aq*) = A(Xn*), so x An — 
x Aq * G I a- Furthermore, by the assumption of positivity of coordinates Sq 
on elements in Aq [t], it follows that q*,n* > for s G Sq. Then x An * — 
x Aq * G I MA , t , since I M : n s€ / s x s = I A . Also, ^nl,...,n* d ) _ x \(o, q * 2 ,..., g * d ) £ 

Im a t : X T ^ Q[ x 2, ■ ■ ■ > x d] ■ By the assumption on Im a t : X T ^ follows that 

x A(0,n*,...,n*) _ x X(0,q*,...,q*) g ^ _ 

Since G2 is a lex Grobner basis for the ideal Im a t H Q[x2, ■ ■ ■ , xj\ by the 
elimination theorem, one of the lead terms of the basis G2 must divide the 
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lead monomial x A ( ' n 2'"'' n S), since we have just shown that this is the lead 
monomial in a binomial that belongs to lu A t H Q [x2, ■ ■ ■ , Xd] ■ This means 
that the indices of positive coordinates of the exponents mf of the lead 
monomial must be included in the positive coordinates of n*. Since the 
corresponding coordinate values are or 1, the divisor must also divide n*. 
This contradicts its construction above as the normal form without divisors. 
Hence, it cannot be the case that n\ > gj. This proves that I2 < < q\ = ^2- 
The remaining coordinates are proved similarly. □ 

There is a corresponding result for the upper bounds. Whereas the lex 
basis relates IP minimization to the normal form of a monomial, it is the 
grevlex basis that relates IP maximization to the normal form. We state 
the result below only for the first entry, since it must be applied repeatedly. 
Using the result requires recomputing a grevlex basis for each of the matrices 
Ai (containing columns i, i + 1, . . . , d from A) and rechecking the condition, 
because we cannot simply apply an elimination theorem on a single lex basis 
as before. 

Proposition 5.2. Suppose a Markov subbasis M^t = {±mi, • • • , 
has the property that G := {x 111 ^ — x m i ,i = 1, . . . , g} is a grevlex Grobner 
basis with ordering > x^-i > ■ ■ ■ > x\ on indeterminates for the generated 
ideal lM At - Also, suppose Im a t : ]lsGi" s x s = Ia, where Sq is the collection 

of coordinates which are always positive for elements in Aq 1 [t] . If the coor- 
dinate values of mf are in {0,1}, then ui(t) = Ui(t). 

PROOF. We show that Ui <ui. Let m £ A -1 [t]. 

Use long division to compute the normal form of x m with respect to the 
grevlex basis Im a t • Let the normal form be the monomial x n . It is nearly 
immediate that n\ < U\, since the exponent on x\ of the normal form when 
dividing by a grevlex Grobner basis with reversed indeterminate order for 
the full ideal I a is U\. 

Let q* solve U\ = maxjgi : Ac[ = t,q€ We show that q* < n\, which 

together with n\ < u\ will prove the result U\ <u\. 

Suppose by way of contradiction that q\ > n\. Since q* is rational, an 
integer multiple, say Aq*, is integral. Then A(Aq*) = A(An*), so x An — 
x Aq £ I a - Furthermore, by the assumption of positivity of coordinates Sq on 
elements in -Aq 1 ^], it follows that q*, n* > for s E Sq. Then x An * - x Aq * € 
lM A , t by the assumption I MA>t : I\ s &i Sn x s = Ia- 

Since G is a Grobner basis for this ideal, one of the lead terms of the basis 
must divide the lead monomial x An * . This means that the indices of positive 
coordinates of the exponents m,^ of the lead monomial must be included 
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in the positive coordinates of n*. Since the corresponding coordinate values 
are or 1, the divisor must also divide n*. This contradicts its construction 
above as the normal form without divisors. Hence it cannot be the case that 
q* > n*. This proves that U\ = q* < n\ < u\ . □ 

The corollary below, combining Propositions 5.1 and 5.2, applies directly 
to Examples 7.1, 7.2 and 7.4. 

Corollary 5.1. If a lex Grobner basis for I a has square-free expo- 
nents on the lead monomials, then lj = Lj for all j = 1, . . . ,d. If each grevlex 
Grobner basis for ,j = l,...,d, and indeterminate ordering Xd > x^-i > 
■ ■ ■ > x% has square-free exponents on the lead monomials, then Uj = Uj for 
all j = 1, . . . , d. 

Proof. The assumptions of Proposition 5.1 hold if Iht At = Ia, so the 
lower bounds from LP and IP are equal. For the upper bounds, the statement 
is a restatement of Proposition 5.2 for each step in the sequential sampling. 

□ 

6. Sampling distributions. Assume that the sequential interval property 
holds for a multiway table with given constraints, and that the intervals 
can be approximated by LP. The next question is how to sample from these 
intervals. Ideally, we want to sample a cell value from the true marginal 
distribution of a cell entry conditional on the entries that have already been 
sampled. However, these marginal distributions are quite difficult to compute 
explicitly except in very small examples. SIS samples from a simple proposal 
distribution (rather than the true distribution) on the set of all possible 
marginal values. 

For a target uniform distribution, which is useful for counting the total 
number of tables and some Bayesian applications, we propose a uniform 
distribution on the available interval for each cell, that is, p(x) = l/(u — Z + 1) 
on integers in the interval [/, u] . We call this the "uniform sampling method." 
With the length of the proposed sampling interval, the importance weights 
can be computed exactly for reweighting at the end. This strategy gives 
low cv 2 (< 5 for all examples we have tested) and works very well on the 
examples in Section 7. 

For a target hyper geometric distribution, which arises in conditional in- 
ference with multinomial sampling, we propose to sample a cell value from 
the hypergeometric distribution p(x) = (") ( 1+ ™_ X ) / (^) on the interval of 
available integers [l,u]. We call this the "hypergeometric sampling method," 
which is usually (but not always, see Example 7.4) better than the uniform 
sampling method when the target distribution is hypergeometric. This hy- 
pergeometric proposal does not give the exact hypergeometric target in the 
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end. It is just a reasonable marginal approximation. This method gives satis- 
factory results for examples in Section 7, although the cv 2 is not consistently 
small. For sparse tables, approximating the marginal mass function of the 
count in a single cell can be difficult. 

7. Examples. In the examples that follow, we sample sequentially from 
intervals computed with the LP approximation. The LP approximation is 
very close to or exactly equal to the IP range in all examples. In Example 7.1 
one can apply known results on Markov bases to avoid algebraic calculations, 
and the most basic results of Section 3 apply. In Example 7.2 one must do 
explicit algebraic calculations to verify the conditions of Section 3. We did 
a detailed numerical comparison with the Markov chain on Example 7.2. 
Example 7.3 (6-way Czech autoworker data) is one that requires the full 
theory of Markov subbases of Section 4 and consideration of the specific 
margin values to get sequential intervals under one model. We also study a 
second model for which we could not compute the Markov basis, and we see 
that SIS still works well. The no-3-way interaction model of Example 7.4 is 
a well-known example where the Grobner basis involves moves of size 2, and 
yet the sequential theory applies perfectly. Example 7.5 is a classic triangular 
genotype table, and it brings out the importance of checking different cell 
orders. In some orders the sequential interval property holds, and in other 
quite natural orders it does not, and this can be seen in the lex basis. Finally, 
Example 7.6 is an important application of sampling on lattice points that 
are not strictly speaking contingency tables. The work of Rapallo [27] on 
Markov bases and structural zeros may be useful for other examples. 

The starting point to verify the conditions of Sections 3, 4 and 5 for a 
particular example is to attempt to compute the toric ideal I a- For this we 
have used the toric library toric. lib in the free software Singular [19] and 
the groebner command in 4iz2 [21]. The software 4ti2 was used to construct 
constraint matrices for several examples. The operations of saturation and 
quotient (":") that figure in the results of Sections 4 and 5 were done quickly 
in Singular. 

In the following examples, all results are based on 1000 random samples 
using either the uniform sampling method or the hypergeometric sampling 
method. The code was written in R [26] and the software IpSolve was called 
from R. The running times range from several seconds to a few minutes on a 
2.0 GHz computer. When IP is used instead of LP, a computation typically 
takes hours, and sometimes it will not terminate in a reasonable amount of 
time. 

Example 7.1. Consider the 3-way case/control data (Table 1) in the 
4x4x2 table from the Ille-et-Verlaine cancer study of the age 35-44 group 
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Table 1 





Age 35-44 data on 


oesophage 


al cancer 


from [3] 












A 






1 


2 


3 


4 


R 


= T 


1 


60 


35 


11 


1 






2 


13 


20 


6 


3 






3 


7 


13 


2 


2 






4 


8 


8 


1 





R 


= 1 T 


1 











2 






2 


1 


3 












3 





1 





2 






4 















([3], Appendix I). The factors are Alcohol level (A), Tobacco level (T) and 
Response R, where R = is a control measurement and R = 1 is a case. 

The "case" outcomes are sampled with a multinomial distribution with 
probabilities p(a,t\l) on the Alcohol and Tobacco covariates. The "control" 
outcomes are also sampled with a multinomial distribution with probabili- 
ties p(a,t\0). With a retrospective model p(a,t\l) /p(a,t\0) = e aa+l3t of eight 
parameters, the appropriate margins to fix for conditional inference [treating 
p(a,t\0) as unknown nuisance parameters] are [A, T] (sum over case/control 
counts at each level), [A, R] and [T,R] (sums over other factor at each re- 
sponse level). The constraints imply that the Graver basis ([28], page 55) for 
the independence model on T and R is a Markov basis, and the Graver basis 
is equivalent to the collection of square-free circuit moves on one level of the 
Response factor. Thus, the results of Section 3 and Corollary 5.1 imply the 
property of sequential intervals and LP will give the exact integral interval 
bounds at each step. 

The simulation with LP gave 100% good tables. When the underlying 
distribution is uniform, the uniform sampling method gave cv 2 of 0.24 and 
estimated the total number of tables to be 25, a number confirmed by LattE 
in a total elapsed time of 7 seconds on a 2.8 GHz desktop. When the under- 
lying distribution is hypergeometric, the hypergeometric sampling method 
gave cv 2 of 0.5, and the estimated p-value for the exact goodness-of-fit test 
[defined by equations (1) and (2)] is 0.04. 

Example 7.2. Consider the 4-way abortion opinion data (Table 2) from 
[8], page 129. The observations are classified according to race, sex, age and 
opinion. There are three different opinions: yes means supporting legalized 
abortion, no means opposing legalized abortion, and the last one is unde- 
cided. 
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Table 2 

A-way abortion opinion data from [8] 



Race 


Sex 


Opinion 


18 25 


26 35 


36 45 


46-55 


56 65 


66+ 


White 


Male 


Yes 


96 


138 


117 


75 


72 


83 






No 


44 


64 


56 


48 


49 


60 






Undec. 


1 


2 


6 


5 


6 


8 




Female 


Yes 


140 


171 


152 


101 


102 


111 






No 


43 


65 


58 


51 


58 


67 






Undec. 


1 


4 


9 


9 


10 


16 


Nonwhite 


Male 


Yes 


24 


18 


16 


12 


6 


4 






No 


5 


7 


7 


6 


8 


10 






Undec. 


2 


1 


3 


4 


3 


4 




Female 


Yes 


21 


25 


20 


17 


14 


13 






No 


4 


6 


5 


5 


5 


5 






Undec. 


1 


2 


1 


1 


1 


1 



Christensen fits the log-linear model for the expected cell counts with all 
three-way interactions and all lower order terms. A shorthand notation for 
this model is to list its highest-order interaction terms: [RSO], [RSA], [ROA] 
and [SOA]. The conditional goodness-of-fit test for this model requires fixing 
all 3-way margins, [R, S, O], [R, S, A], [R, O, A] and [S, O, A]. The lex basis of 
165 elements is square-free in the lead monomials, so the sequential interval 
property holds by Section 3 and the IP and LP lower bounds are identical. A 
more detailed calculation to verify the conditions of Corollary 5.1 requires 
computing a grevlex basis for each of the submatrices of Ai, defined in 
Section 3 as the matrix that has columns i,i + 1, . . . , d from A. This can be 
done and the condition is verified, proving that LP and IP upper bounds 
are always the same. 

The LP method for finding the interval bounds gave 100% good tables in 
practice. When the underlying distribution is uniform, the uniform sampling 
method gave cv 2 of 2.92 and estimated the total number of tables to be 
9.1 x 10 7 . When the underlying distribution is hypergeometric, the value of 
cv 2 using the hypergeometric sampling method was around 102.9, and the 
estimated p-value for the exact goodness-of-fit test [defined by (1) and (2)] 
is 0.85 with standard error 0.1, based on 1000 tables which took about 5 
minutes in R on a 2.0 GHz computer. The MCMC algorithm generated 1000 
samples (with 1,000,000 samples as burn-in) in 224 minutes and estimated 
the p-value to be 0.84 with standard error 0.05. Thus, SIS is about 11 times 
faster than the MCMC algorithm for this example. 

The algebraic conditions for SIS with some models on this data are diffi- 
cult to verify. For example, 4ti2 runs for an hour on a 2.8 GHz Linux desktop 
with 1 GB of memory without completing the Markov basis calculation on 
the model [RS], [RA], [RO], [SO], [SA], [OA]. 
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Table 3 

6-way Czech autoworker data from [18] 



B no yes 



F E 


D 


C 


A no 


yes 


no 


yes 


Negative <3 


<140 


no 


44 


40 


112 


67 






yes 


129 


145 


12 


23 




>140 


no 


35 


12 


80 


33 






yes 


109 


67 


7 


9 


>3 


<140 


no 


23 


32 


70 


66 






yes 


50 


80 


(0) 7 


13 




>140 


no 


24 


2o 


73 


57 






yes 


51 


63 


7 


16 


Positive <3 


<140 


no 


5 


7 


21 


9 






yes 


(0)9 


17 


(0) 1 


(0)4 




>140 


no 


(0)4 


3 


11 


8 






yes 


14 


17 


5 


(0) 2 


>3 


<140 


no 


7 


(0)3 


14 


14 






yes 


9 


16 


(0)2 


(0) 3 




>140 


no 


(0)4 


(0)0 


13 


11 






yes 


(0)5 


14 


(0)4 


i 



Example 7.3. Consider the 6-way binary Czech autoworker data in Ta- 
ble 3 from a prospective study of probable risk factors for coronary throm- 
bosis [18]. There are 1,841 men in a car factory involved in the study. Here 
A, B, C, D, E and F indicate different risk factors. One reasonable model is 
given by [ACDEF], [ABDEF], [ABCDE] , [BCDF], [ABCF], [BCEF] [17]. The 
conditional goodness-of-fit test for this model requires fixing the three 5- way 
and the three 4-way margins in the above model representation. Implement- 
ing SIS for this example requires techniques beyond the basic methods of 
Section 3, because the lex basis does not have square-free lead exponents. 

In Table 3 (0) indicates that the LP lower bound for that cell entry is 
with the constraints from the model above; the others are strictly posi- 
tive. Identifying these cells is relevant when we apply Propositions 4.1, 5.1 
and 5.2, as the (0) cells form the complement of the set Sq (defined in 
Proposition 5.1). 

The lex basis for the toric ideal with lex order in indeterminates yields 20 
elements, the first of which has an exponent of 2 on the lead indeterminate 
^lliiil- Therefore, Proposition 3.1 cannot be applied directly. However, the 
ideal generated by the other 19 polynomials saturates in one step with re- 
spect to the monomial Ilses 3 ^; where S is the set of 41 coordinates that 
must be positive. Hence, by Lemma 4.2 these 19 moves are a Markov subba- 
sis. They are a lex Grobner basis for themselves, and they have the satura- 
tion property required in Proposition 4.1, so the sequential interval property 
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holds. Furthermore, Proposition 5.1 shows that the IP and LP lower bounds 
are always the same [which also implies that the (0) cells in the rationals are 
the same cells as those that could be in the integers]. Corollary 5.1 does 
not apply to show that the LP and IP upper bounds are the same, because 
exponents of 2 appear in the grevlex bases. We can use Proposition 5.2 on 
successive cells to show that LP and IP are the same after a few initial cells. 

If the cells are filled in across rows and then down, the order is 111111, 211111, 
121111, . . . (the order from 4ti2). The sequential interval property holds in 
this order as well. 

For this model, using LP for interval bounds gave 100% good tables. The 
shuttle algorithm gave 99% good tables with one iteration and 99.5% with 
two iterations. When the underlying distribution is uniform, the uniform 
sampling method gave cv 2 of 1.09 and estimated the total number of tables 
to be 841. The quantity cv 2 when targeting the hypergeometric distribution 
using the hypergeometric sampling method was 50.7, and the estimated p- 
value for the exact goodness-of-fit test [defined by (1) and (2)] is 0.27. Fitting 
this model in R using the loglin command gives a x 2 statistic of 5.8 on 4 
degrees of freedom, for a p-value of approximately 0.21. 

Consider the model of all 15 four-element constraints like [A, B,C,D], 
that is, all 4-way margins. We could not obtain the Markov basis for this 
model, but SIS still works well with cv 2 = 5.0 when the target distribution 
is uniform. LP gave 100% good tables, whereas the shuttle algorithm gave 
only 2% good tables after 10 iterations. 

Example 7.4. Consider the 3x3x3 example (Table 4) from [14], 
page 379, with a model of no-3-way interaction. The conditional goodness- 
of-fit test for this model requires fixing all "line sums." 

When ordered left to right across rows, Proposition 3.1 implies sequential 
intervals and Corollary 5.1 gives an IP/LP gap of at every step. In simu- 
lation LP gave 100% good tables, and the shuttle algorithm also gave 100% 
good tables after one iteration. 

When the underlying distribution is uniform, the uniform sampling method 
gave cv 2 of 2.08 and estimated the total number of tables to be 1.9 x 10 12 . 
This is consistent with the number 1,919,899,782,953 from LattE, computed 
in a total elapsed real time of 45 seconds on a 2.8 GHz desktop computer. 



Table 4 
3x3x3 table from [14] 



9 16 
85 52 
77 30 



41 
105 

38 



8 
35 
37 



8 
29 
15 



46 
54 
22 



11 14 
47 35 
25 21 



38 
115 
42 
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Table 5 
Rhesus data from [20] 





1236 














B 2 


120 


3 












B z 


18 
















B 4 


982 


55 


7 


249 








B 5 


32 


1 





12 









B 6 


2582 


132 


20 


1162 


29 


1312 




B r 


6 








4 





4 





B 8 


2 




















B 9 


115 


5 


2 


53 


1 


149 







Bi 


B 2 


Bz 


B 4 


B 5 


B e 


B 7 



B$ Bg 



When targeting the hypergeometric distribution, the hypergeometric sam- 
pling method gave cv 2 = 180.7. 

Example 7.5. Consider data in Table 5 of genotype pairs from [20]. 
The constraints for conditional goodness-of-fit test of Hardy- Weinberg pro- 
portions are the nine allele counts, which are nine linear functions that count 
twice the diagonal entry, so the A matrix has entries 0, 1 and 2. For sequen- 
tial sampling, the order of cells given by Table 6 leads to sequential intervals 
by Proposition 3.1. In general, for the genotype problem sampling across 
rows will not give intervals. 

The lead monomials in a lex basis have exponents that are all or 1, so 
LP gives the exact lower bounds by Proposition 5.1. For the upper bound, 
the grevlex condition of Proposition 5.2 does not hold from the first cell, but 
it does hold after a few cells, so IP and LP give the same bounds after some 
initial cells. The simulation with LP produced 100% good tables. See [23] 
for a direct sampling strategy and some further discussion of this example. 

Example 7.6. Consider a constraint matrix A of the form A = (Aq\I) 
with or 1 entries. Here / is the e x e identity matrix and Aq is size ex/ 



Table 6 



Order of cells 



1 
10 
11 
12 
13 
14 
15 
16 
17 



2 
18 
19 
20 
21 
22 
23 
24 



25 4 

26 31 

27 32 

28 33 

29 34 

30 35 



3 



36 6 

37 40 

38 41 

39 42 



5 



7 

43 
44 



8 
45 



9 
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with columns ax, .. -,a/- This occurs in a tomography problem introduced 
by Vardi [31], where A is a routing matrix for which routes between adja- 
cent vertices use the connecting edge, and the edge counts are put last as 
slack variables. The integer data y = Ax, where x are traffic counts between 
ordered pairs of nodes on a graph and y is the aggregate traffic across links. 
The sampling method of Tebaldi and West [30] for Bayesian computation 
of the posterior distribution is closely related to sequential sampling. Din- 
woodie [15] shows how fast sampling can be used in a Monte Carlo EM 
algorithm for estimating traffic rates. 

The property of sequential intervals holds for the entries of x under 
the constraint Ax = y in the order of the columns. With indeterminates 
wi,...,Wf for the first / columns and z\, . . . , z e for the last e slack variables, 
a lex Grobner basis in Q\w\, . . . , Wf, z±, ■ ■ ■ ,z e ] consists of the / binomials 
Wi — z a > . 

Linear programming will give the exact interval bounds at each step be- 
cause of the square-free lead monomials. The shuttle algorithm will also 
give the exact intervals in one step. The interval for the first cell is exactly 
[0, min{j. a . 1>0 i<i<f}{yi}] and the same type of problem recurs at each step 
!)•••)/■ ' 

It is possible to establish properties of SIS for some classes of examples, 
or, in other words, for some types of contingency tables with certain con- 
straints and margin values. This is an area of ongoing work, but at this time 
we can make some statements. Logistic regression tables with one integer 
covariate and positive column sums (at least one measurement at each level 
of the covariate) have the sequential interval property. This is proved in [7]. 
The subbasis that corresponds to differences of adjacent minors satisfies the 
conditions of Proposition 4.1. However, the IP/LP gap may not be zero. 

Also, two-way tables with structural zeros and fixed row and column sums 
have sequential intervals. The same algebraic technology also shows that 
case/control data with two factors, such as Example 7.1, has the sequential 
interval property. We conjecture that decomposable graphical models will 
have the sequential interval property under some order on the cells, but at 
this time a careful proof is not complete. 

8. Conclusion. We have described an efficient sequential importance sam- 
pling method for sampling multiway tables with given constraints. It can be 
used to approximate exact conditional inference on contingency tables. SIS 
sequentially builds up the proposal distribution by sampling table entries 
one by one. We have presented a theory that relates algebraic properties of 
collections of Markov moves to certain geometric properties of contingency 
tables. The geometric properties of "sequential intervals" and the relation- 
ship of IP to LP are important for the performance of sequential sampling. 
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Many real examples show that the theory is applicable and useful, and can 
be used in some examples when a Markov basis cannot be found. 

In practice, one may try sequential sampling even if the sequential interval 
property does not hold or if the algebraic conditions are not satisfied or not 
checked. If one can find rough bounds for each entry and design the proposal 
distribution carefully, so that the fraction of valid tables is high and the cv 2 
is low, SIS may still give satisfactory results. 

Further work is required to formulate a method to design the proposal 
distribution at each step. We have seen that the uniform sampling method 
works very well when the underlying distribution is uniform. However, when 
the target distribution is hypergeometric, the hypergeometric sampling method 
could be improved. 
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Proposition 4.1 holds with the weaker condition (Jm '■ ILeS xf m ^ n " ■ neA 

/ i and we think that further theoretical work on connectivity of Markov 
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